Machine learning prediction of methane, nitrogen, and natural gas mixture viscosities under normal and harsh conditions

The accurate estimation of gas viscosity remains a pivotal concern for petroleum engineers, exerting substantial influence on the modeling efficacy of natural gas operations. Due to their time-consuming and costly nature, experimental measurements of gas viscosity are challenging. Data-based machine learning (ML) techniques afford a resourceful and less exhausting substitution, aiding research and industry at gas modeling that is incredible to reach in the laboratory. Statistical approaches were used to analyze the experimental data before applying machine learning. Seven machine learning techniques specifically Linear Regression, random forest (RF), decision trees, gradient boosting, K-nearest neighbors, Nu support vector regression (NuSVR), and artificial neural network (ANN) were applied for the prediction of methane (CH4), nitrogen (N2), and natural gas mixture viscosities. More than 4304 datasets from real experimental data utilizing pressure, temperature, and gas density were employed for developing ML models. Furthermore, three novel correlations have developed for the viscosity of CH4, N2, and composite gas using ANN. Results revealed that models and anticipated correlations predicted methane, nitrogen, and natural gas mixture viscosities with high precision. Results designated that the ANN, RF, and gradient Boosting models have performed better with a coefficient of determination (R2) of 0.99 for testing data sets of methane, nitrogen, and natural gas mixture viscosities. However, linear regression and NuSVR have performed poorly with a coefficient of determination (R2) of 0.07 and − 0.01 respectively for testing data sets of nitrogen viscosity. Such machine learning models offer the industry and research a cost-effective and fast tool for accurately approximating the viscosities of methane, nitrogen, and gas mixture under normal and harsh conditions.

between the reduced density (ρ r ) and the residual viscosity modulus ( (µ g − µ * )ε is the basis of the suggested correlation and the gas properties including P c , T c , density, and molecular weight are considered inputs.Lohrenz et al. 18 developed an empirical correlation referred to as LBC for gas mixture viscosity estimation.Their model used the same equation and coefficient for pure fluid in the Jossi et al. 17 correlation and they concluded that their model is highly reliant on the density measurements 2 .Dempsey 19 approximates the gas viscosity ratio at a specified pressure to the gas viscosity at ambient conditions.Lee et al. 20 derived a correlation relevant to gas density, temperature, and gas molecular weight to forecast gas viscosities at reservoir conditions 2 .The correlation is accurate for estimating the viscosity of natural gas below HPHT.Londono 21 optimized both the JST and LGE correlations using a total of 13,656 data points and anticipated two models for gas viscosity prediction.Then another new optimized empirical formula for gas viscosity relevant to the gas density and temperature was developed.Jeje and Mattar 22 paralleled the LGE correlation to Carr et al. 16 for sour and sweet gases.The results showed that the two correlations give identical results for sweet gases.Whereas, for sour gases, there is a significant difference between the compared correlations.Sutton 23 correlated viscosities of gas condensate and associated gases as a function of CH 4 , C 3 H 8 , CH 4 /C 3 H 8 , CH 4 /C 4 H 10 , CH 4 /C 10 H 22 , and natural gas viscosity.They estimated the gas condensate behavior by utilizing the CH 4 /C 10 H 22 binary mixtures.Viswanathan 24 implemented NIST values at (T = 100-400°F and P = 5000 to 30,000 psia) to improve LGE correlation at HPHT conditions.Ohirhian and Abu 25 estimated the natural gas viscosity using experimental data values obtained from routine PVT analysis of the Nigerian crude oil at reservoir conditions (P res = 144-4100 psia and T res = 130-220 °F).The correlation was also compared with the viscosity values derived from Carr et al. 16 charts and Lee et al. 20 viscosity equation, and the results revealed that the new correlation showed precise results.
Recently, computational techniques, mathematical models, and equations of states (EOSs) have been implemented in various industrial applications including petroleum processes 26,27 .Various predictive models including conventional and hybrid ANNs models, and adaptive neuro-fuzzy inference system models 6 have been estimated.Londono et al. 28 established a gas viscosity correlation relevant to gas density, and reservoir temperature, considering non-hydrocarbon compounds including carbon dioxide, helium, and nitrogen, with an AARE of 3.05%.Quraishi and Shokir 2 developed two gas viscosity and density models through genetic programming (GP) and alternating conditional expectation (ACE) algorithms using 4445 datasets with an AARE of 3.95% and 5.4%, respectively.Yang et al. 29 predict gas viscosity for gas samples containing high carbon dioxide concentrations.The formulated correlation is built with 1,539 data points of 9 natural gas samples measured at 250 to 450 K and 0.10 to 140 MPa.The correlation reported an (ARE < 0.98%) between the calculated and the experimental values.Dargahi-Zarandi et al. 3 developed a robust model algorithm based on the GMDH neural network for estimating sour and natural gas viscosity as a function of T pr , P pr , molecular weight, and density using more than 3800 data points reported in the literature.Rostami et al. 1 modeled the natural gas viscosities using integrated smart intelligent strategies with optimized algorithms, where more than 3800 data points were screened for their (T pr , P pr , and molecular weight) to develop the required models.
The fuzziness, non-linearity, and complexity of the reservoir behavior necessitate a prevailing tool to overwhelm these challenges 6,12 .In this regard, soft computations and machine learning gained incremental spread in petroleum and chemical engineering issues 12,[30][31][32][33] .These approaches include various predictive models such as ANN, GP, ANFIS, SVM, and their optimized hybrids 1, [34][35][36] .ANNs used to outperform multivariate nonlinear regression processes 37 .This work introduces the utilization of machine learning algorithms to precisely estimate the viscosities of CH 4 , N 2 , and gas mixture under normal and harsh conditions relevant to pressure, temperature, and gas density based on the measured PVT database reported in the literature to develop and test the models.The network's prediction was validated against the published one.The results displayed better performance for the developed models concerning viscosities of CH 4 , N 2 , and gas mixture.Seven machine learning techniques specifically Linear Regression, Random Forest (RF), Decision Trees (DT), Gradient Boosting, K-Nearest Neighbors (KNN), Nu Support Vector Regression (NuSVR), and artificial neural network (ANN) were applied for the prediction of methane (CH4), nitrogen (N 2 ), and natural gas mixture viscosities.The reasons behind using the considered ML types in modeling the gas viscosity lie in the viscosities of gas mixtures exhibit non-linear behavior under different conditions.Moreover, using machine learning models can help uncover and model these complicated relationships.The considered Machine learning models offer flexibility in modeling different types of data and handling various inputs.Machine learning models can handle multiple input variables and their interactions efficiently, potentially providing accurate predictions across diverse scenarios.Furthermore, machine learning is a data-driven approach, meaning it can learn patterns and relationships directly from the available data.
Unlike most of the literature on ML models, the viscosity of three distinct gases is predicted using seven different machine-learning techniques.Previous research focuses only on the modeling of gas mixture viscosity 2,38 .In addition, seven different machine-learning techniques were utilized for the prediction of gas viscosity.The previous research was mostly directed only at artificial neural networks (ANNs) and adaptive neuro-fuzzy inference systems (ANFIS) 1,2,39,40 .To the best of the author's knowledge limited studies have applied ML techniques for Methane and Nitrogen viscosity prediction.Almost all viscosity models developed using ML algorithms are suitable for gas mixtures.Moreover, no work has been quoted in the literature that predicts methane, nitrogen, and natural gas mixture viscosities under normal and harsh conditions using seven distinct machine-learning techniques.

Materials and methods
This article utilizes statistical approaches for analyzing the results of published PVT experiments for CH 4 , N 2 , and gas mixture at standard conditions as well as severe conditions relevant to high temperature and pressure.The goal is to clarify the significance of important elements (pressure, temperature, and gas density) on the viscosity of CH 4 , N 2 , and gas mixture during PVT experiments.The research's methodological approach is shown in Fig. 1.Initially, the published experimental PVT data and results were gathered.Additionally, the database was shown and explained using a histogram including pressure, temperature, and gas density together with the viscosity of CH 4 , N 2 , and gas mixture.Moreover, relationships between viscosity versus pressure, temperature, and gas density were established using sensitivity analysis, analysis of variance, factorial design, contour plot, and main effects plot.Finally, ML algorithms were applied to predict the viscosity of CH 4 , N 2 , and gas mixture at standard and harsh conditions of temperature and pressure based on PVT data experiments.
The novelty of this research exists in the approach to present an advanced Machine learning model to predict methane, nitrogen, and natural gas mixture viscosities under a wide range of data.More than 4304 datasets from real experimental data utilizing pressure, temperature, and gas density were employed for developing ML models.Moreover, the article exhibits additional statistical approaches by which relationships between viscosity versus pressure, temperature, and gas density were established.The article also provides three new ANN-based correlations related to methane, nitrogen, and natural gas mixture viscosities.Therefore, this research is remarkable in its approach, as it extensively predicts methane, nitrogen, and natural gas mixture viscosities using seven distinct machine-learning algorithms for the first time.

Exploratory data analysis
Establishing an accurate predictive machine learning model (ML) strongly relies on the quality of the input dataset.The accurate prediction capability of ML models is determined by the effect of input feature significance on the output parameters.Analysis of variance (ANOVA) test was employed using Minitab software, to analyze several variables of the generated model.The relationship between the viscosities of the methane, nitrogen, and natural gas mixture and the number of independent variables was examined using the values of P and F. If the P-value < 0.05, the factor's statistical importance is proven.To highlight the impact of different independent parameters even more on the methane, nitrogen, and natural gas mixture viscosities, factorial, contour plots, and main effects plots were analyzed.

Machine learning (ML) models
Seven machine learning algorithms have been applied to predict methane, nitrogen, and natural gas mixture viscosities under normal and harsh conditions: Linear Regression, Decision Trees (DT), Gradient Boosting, Random Forest (RF), Nu Support Vector Regression (NuSVR), Artificial Neural Network (ANN), and K-Nearest Neighbors (KNN).The definitions of such ML are as follows: 1. Linear regression is a statistical technique that uses the association between two or more quantitative variables to predict a response or outcome variable based on the others 30 .This methodology is frequently utilized in numerous disciplines.2. The decision tree (DT) is a flowchart-like model, where the internal node signifies an attribute test, each branch signifies a test result, and each leaf node provides a class label or prediction 41 .The DT is simple to grasp and analyze, and it can deal with both category and numerical data.Furthermore, the DT can be utilized for classification and regression problems, as well as a foundation for more complex ensemble approaches 42 .3. Random forest is an ensemble learning method that allows forecasts by combining many Decision trees.It generates many Decision trees and outputs the mode or mean prediction of each tree 42,43  www.nature.com/scientificreports/randomization into the tree-building process, the Random Forest avoids the overfitting propensity of Decision trees and increases generalization.4. Gradient boosting is a popular ensemble method for integrating numerous weak models to build a strong prediction model 44 .Gradient boosting also contains regularization techniques to prevent overfitting, as well as parallel processing and tree pruning, making it efficient and very accurate.It has gained traction in numerous machine learning competitions and is frequently utilized in the industry.5. Support vector regression is a machine learning theory that is founded on the notion of structural risk minimization 45 .Support vector regression methods are a promising data mining and knowledge-finding technique.In the 1990s, support vector regression was introduced as a nonlinear solution for regression analysis.It was created for pattern recognition tasks using the paradigm of statistical learning theory.Support vector regression has gathered considerable interest as it has been recognized to be very robust for modeling numerous complex problems 42 .6. K-nearest neighbour (KNN) is a widely used statistical method for pattern recognition and classification that is quick and easy to use in several different fields 46 .K-Nearest Neighbour is built on the premise that similar input points will produce similar outputs.These related input points are first sorted into groups in n-dimensional space.For new point output, the k points that are closest to the new point are selected, and the group with the highest number of points near the new point is found by analysis.After that, the points are tallied, and the group with the greatest number of points within the hypercube is approximated.Finally, the new point is allocated to that group, and the cluster prediction method is used to anticipate its output.7. Artificial neural network (ANN) is an ML model that was stimulated by the anatomy of the human brain 30,47 .
The ANN is made up of linked neurons that are organized into layers 31 .Each neuron processes input data and sends it to the next layer, allowing for complicated computations 48 .The ANN may learn from a variety of data and be applied to tasks such as classification, regression, and pattern recognition.Fully connected neural networks are useful in petroleum-related applications where the inputs and outputs have nonlinear correlations 32,49,50 .

Models evaluation and error analysis
For examining the model applicability and precision, several traditional statistical measures and graphical error analyses are used to assess the accuracy, validity, and reliability of the produced models, and predict the performance of the created machine learning algorithms 12,51 .Furthermore, relative error distribution graphs and Cross-plots were also employed.

Statistical error analysis
It is a vital tool to assess the model's performance relative to the experimental values 52 .In the current work, the developed paradigms' reliability was screened by some statistical principles involving (ARE%) 1 , (AARE%), (RMSE), (SD), and (R 2 ) 12 .The following formulas are used to compute these statistical parameters 3,4,37 .
(1) Cross plot.The predicted/experimental data points are plotted versus each other to assess the model's competence in the prediction of the experimental one 3 .The diagram capability is evaluated through the grade of data gathering around the equality line and the deviation from the 45° line.
Error distribution diagram.It displays the error distribution around the zero-error line 3,4 , where the benchmark is the neighboring data amount around the zero-deviation line 1 .The gathering of predictions around the zero-error line is very reasonable.
Cumulative frequency versus AARE%.It is the sum of classes in a frequency distribution (i.e.adding up a value and all of the values that came before it 4 .

Data acquisition and analysis
Reliable data were gathered from the published literature 51 .The dataset collected from literature relevant to the viscosities of CH 4 , N 2 , and natural gas mixture with their correlating parameter ranges are indicated in Table 1.
Methane and nitrogen viscosities are correlated relevant to reservoir temperature (T res ) and pressure (P res ), while the viscosity of the natural gas mixture is correlated relative to reservoir pressure, temperature, and gas density.The collected data were treated to develop an ANN model for viscosity calculation.
The dataset was graphically displayed, and the sampling distribution was explained using histograms.Figure 2A-C shows the histograms of the data sets for methane case (pressure, temperature, and viscosity) respectively.The datasets encompass a variety of pressure from 14.7 psia up to 25,000 psia (harsh conditions).However, most of the data points range from 14.7 and 10,000 psi.Furthermore, the data on temperature showed a large range of values (77-482 °F).The majority dataset of temperature falls between 100 and 350 °F.Regarding methane viscosity, the most common values varied from 0.011 to 0.05 cp. Figure 2D-F shows the histograms of the data sets for nitrogen viscosity.A variety of pressures are covered by the datasets, ranging from 14.7 psia up to 24,000 psia (harsh conditions) but most of the data points lie between 14.7 and 12,000 psi.Likewise, a large range was present in the temperature data (− 343 to 1880 °F).Most temperature data is in the range of − 340 to 350 °F.The most frequent values for methane viscosity ranged from 0.006 to 0.0.07cp.Figure 2G-I depict the histograms of pressure, density, temperature, and viscosity for mixture gas from the reported studies.The databank covers a range of pressure from 14.7 psia up to 25,000 psia (harsh conditions).However, most of the data points range from 14.7 to 10,000 psi.Similarly, the data on temperature showed a large range of values.(− 8.6 to 360 °F).The most frequent values for gas mixture viscosity ranged from 0.01 to 0.1cp.Furthermore, the density of gas mixture data had an extensive range of values (0.02 to 0.6 lb/ft 3 ).

Sensitivity analysis
Sensitivity analysis relies on the absolute relevancy factor (r) analysis and is used to evaluate the effect of input parameters on the viscosities of methane, nitrogen, and natural gas mixture 3,32 .The effect of the input parameter is directly correlated with its relevancy factor's absolute value as follows 4 : (i) subscript , The data point index; Ij,i, is jth input parameter; Ij, Corresponds to the average value of Ij, I; P and Ṕ, Correspond to the predicted value and its average, respectively.
Nitrogen and methane viscosities are directly proportional to the pressure with a correlation coefficient of 4% and 97% respectively, and inversely proportional to temperature with a correlation coefficient of 27% and 10% respectively.The viscosity of the natural gas mixture is directly proportional to the pressure with a correlation coefficient of 95%, temperature with a correlation coefficient of 52%, and gas density with a correlation coefficient of 80%. Figure 3a-c display the implications of independent correlating parameters (pressure, temperature, and gas density) on the viscosity values of N 2 , CH 4 , and gas mixture, where (P, T) has the highest impact in the case of N 2 , CH 4 viscosity, while gas density greatly affects in case of gas mixture viscosity 2,4 .Correlation coefficients range from (0-1), whereas 1.0 indicates a strong correlation, while 0.0 indicates no correlation between the input and the targeted variables 53,54 .Sensitivity analysis aims to measure the model's sensitivity towards the independent variables.The variable's influence on the prediction decreases with decreasing percent values for the designated independent variables.The analysis results assign the selection of independent variable sets for more precise prediction.A low-impact variable can be removed.The relative impact of the trained set variables may vary more in different data sets.The statistical parameters of the utilized datasets modeling through this study are summarized in Table 2.The important parameters are skewness, and kurtosis which are defined as: i. Skewness which signifies the property asymmetry degree relevant to its mean value.Skewness ranges from (− 1) to (0) to (+ 1), whereas zero skewness corresponds to the normal distribution, while (+ 1 and − 1) correspond to non-normal probability distribution.ii.Kurtosis describes the distribution of data with the normal probability shape.The normal distribution has zero kurtosis.Furthermore, compared to the normal distribution, the positive and negative kurtosis correspond to a more peaked and flatter distribution 1 .The pressure and temperature values greatly influence the gas viscosity.Consequently, statistical approaches were utilized to screen the highest meaningful factors.These statistical analyses were performed using Minitab software 55 .

The analysis of variance (ANOVA)
ANOVA is a statistical approach used in analyzing many problems in the upstream oil industry 34,56 .In this regard, Methane, nitrogen, and mixture gas were the three types of gases whose viscosity was studied.ANOVA was used to obtain a quantitative interpretation of the investigated parameters.The results of ANOVA are exhibited in Tables 3, 4, and 5.As displayed in the first analysis (methane viscosity), P, and T have a significant impact on the viscosity of methane based on the P-value of to a lesser extent than 0.05.Furthermore, pressure displays the greatest F-value (119,178.47),which indicates the stronger influence of such a parameter compared with the www.nature.com/scientificreports/temperature parameter.In contrast, for the second analysis (nitrogen viscosity), the temperature has a meaningful effect on the viscosity of nitrogen based on the P-value and the F-values of 0 and 25.99 respectively.According to the P-value of less than 0.05 for the third case (gas mixture), temperature, pressure, and gas density, all significantly affect the gas mixture's viscosity.Furthermore, the pressure displays the highest F-value (33,426.02)which indicates the strongest influence of the pressure parameter compared with other parameters, which also reflects its significance on gas viscosity.

Factorial design
Factorial design is an important statistical method for optimizing and simplifying laboratory experiments by investigating the effects of multiple controllable factors on interest responses 34,57 .These statistical evaluations of gas viscosity were conducted using Minitab software 55 .Additionally, the results of the factorial design are shown in the Pareto chart.The Pareto plot uses horizontal bars to show the influences of the parameters from maximum impact to smallest impact.To further indicate which factors are statistically significant, a reference line is drawn on the Pareto chart.In the current analysis, the assessment was achieved using pressure, temperature, and gas density.The factorial design was performed for the three cases: methane, nitrogen, and gas mixture.The factorial design-generated Pareto chart for the methane viscosity results is shown in Fig. 4a.Pressure is found to have the most significant effect on the performance of methane viscosity.Figure 4b describes the Pareto chart for the results of nitrogen viscosity, where temperature was found to have the most dominant performance in nitrogen viscosity.Figure 4c exhibits the Pareto chart generated by factorial design for the results of mixture gas viscosity.According to the Pareto graph, the factors that do better than the reference line are statistically significant.Therefore, it is apparent that pressure, gas density, and temperature were found to be the leading parameters respectively in the viscosity of the gas mixture.In conclusion, the factorial design statistical analysis agrees with the ANOVA test results.

Contour plots
A contour plot was employed to evaluate the influence of input variables on methane viscosity.Two parameters' results were scrutinized concurrently on the methane viscosity values.Figure 5a shows the contour plot for methane viscosity to analyze the mutual effects of pressure and temperature.The contour plot reveals that the highest values of methane viscosity > 7.0 cp were found in the elevated pressure (above 22,000 psi) and low-temperature values.In contrast, the lowest values of methane viscosity < 0.02 cp were found in the low pressure (above 3000 psi) and all ranges of temperatures.Likewise, Fig. 5b shows the surface plot for nitrogen viscosity.The valley in the lower region of the graph represents the highest values of nitrogen viscosity > 0.06 cp, which represents all ranges of pressure and lower values of temperature.At the constancy of pressure and increasing temperature, the

Main effect plots
Figure 6a shows the main effects plot for the methane viscosity.The red line horizontal line represents the mean value of methane viscosity.At a low-pressure level, the viscosity is below the mean value then methane viscosity increases with increasing pressure until reaches its maximum value at the highest level of pressure.In contrast, at a low-temperature level, the methane viscosity is above the mean value, then methane viscosity decreases with increasing temperature until reaches its lowest value at the highest level of temperature.The graph implies that pressure is the greatest powerful factor in methane viscosity.Likewise, the effects of P and T on nitrogen and gas mixture viscosity (Fig. 6b,c) respectively exhibit the same performance as in the case of methane.Furthermore, at a low level of gas density, the viscosity is lower than the mean value, then gas mixture viscosity improves with rising the pressure until reaches its maximum value at the highest level of gas density.

Development of machine learning (ML) models
In pursuit of accurate gas viscosity predictions crucial for optimizing petroleum processes, we undertook a comprehensive investigation employing a diverse set of machine learning models.These models were chosen for their ability to capture intricate nonlinear relationships and patterns within the available complex datasets.
In the development of the current model, 1664, 1672, and 968 data points were collected to model the viscosities of CH 4 , N 2 , and natural gas mixture respectively.The collected database was divided randomly into three subgroups as follows 4 .iii.Testing set An invisible, independent set is used to evaluate the trained network model's performance.

Linear regression
Linear Regression, a fundamental regression method, was applied to capture the potential linear relationship between the inputs and the target variables.Despite its simplicity, Linear Regression exhibited satisfactory results in predicting gas viscosity as displayed in Figs. 7, 8, 9, 10, 11, and 12, achieving a respectable R 2 score of 0.9437 and 0.9331 on the testing set for methane and natural gas mixtures respectively.The model's performance, while commendable, indicates the potential presence of nonlinear relationships that could be better captured by more complex models.However, Linear Regression's performance varied significantly when applied to Nitrogen gas (Figs. 9 and 10).In this case, the model exhibited a notable drop in performance, achieving a very poor R 2 score of 0.0736.This discrepancy in model performance suggests that Nitrogen's viscosity is influenced by factors that Linear Regression, primarily designed for linear relationships, might not effectively capture.The presence of complex nonlinear interactions in the Nitrogen dataset could explain this divergence.Consequently, this underscores the importance of selecting models that are well-suited to the underlying patterns within specific datasets.In the case of Nitrogen, more advanced modeling techniques are required to accurately represent the intricate relationships impacting gas viscosity.

Decision tree (DT)
Decision Trees offer a powerful framework for capturing intricate interactions within the data.Our Decision Tree model displayed impressive performance on the testing set as displayed in Figs.

Gradient boosting (GB)
Gradient Boosting is another sequential ensemble technique that iteratively improves predictions by focusing on samples that the previous iteration misclassified.GB adapts by adjusting subsequent models to correct for errors, culminating in a strong collective predictor.The Gradient Boosting model displayed the best performance, yielding an R 2 score of 0.9988, 0.9969, and 0.9918 for methane, natural gas mixtures, and nitrogen, respectively

Nu support vector regression (NuSVR)
NuSVR, a variant of Support Vector Regression, is adept at capturing complex nonlinear relationships.The NuSVR model achieved a respectable R 2 score of 0.9926 for methane on the testing set.While not as high as  some other models, NuSVR still demonstrated its ability to capture intricate patterns in the data and provide accurate predictions under various conditions.However, the performance of NuSVR did exhibit some degree of variability, most notably evident in its R 2 scores of 0.8855 for natural gas mixtures and 0.0146 for nitrogen as displayed in Figs.31, 32, 33, 34, 35, and 36.The relatively high R 2 score for natural gas mixtures underscores NuSVR's capacity to grasp the underlying intricacies within the data and make precise predictions.A score of 0.8855 implies that the model accounted for 88.55% of the variance in the natural gas mixtures' viscosity, indicating a substantial level of accuracy.It is noteworthy that the dataset for natural gas mixtures could involve more predictable patterns or perhaps align more closely with the underlying assumptions of the NuSVR model.Conversely, the notably lower R 2 score of 0.0146 for nitrogen suggests a diminished ability to capture the complex relationships specific to nitrogen's viscosity, accounting for only 1.46% of the variance.This discrepancy is  www.nature.com/scientificreports/indicative of the model's sensitivity to variations in data patterns and underscores the importance of understanding the specific characteristics of the dataset.In the case of nitrogen, the dataset may comprise more intricate, non-linear relationships that may challenge the model's predictive capabilities.Overall, NuSVR's performance emphasizes its adaptability, demonstrating a commendable ability to capture complex data patterns and provide precise predictions.However, the variation in its performance across different datasets underscores the model's sensitivity to the specific characteristics of the data it encounters.This nuanced understanding can inform future applications of NuSVR in scenarios with distinct data profiles.2, the ensemble techniques, Random Forest and Gradient Boosting, emerged as the most promising models for accurately predicting gas viscosity across varying conditions.The ensemble nature of these methods contributed to their robustness and adaptability to the complex  www.nature.com/scientificreports/interactions present in the data.These findings underscore the significance of leveraging sophisticated machine learning approaches to enhance the precision of viscosity predictions, thereby facilitating the optimization of gas processing and reservoir management in the petroleum industry.

Comparison of the performance of machine learning techniques (non-ANN models)
In this section, the performance of the RF, DT, Gradient Boosting, KNN, NuSVR, and linear regression were compared.Table 6 lists a summary of calculability efforts (such as CPU time and memories).It is shown that the proposed ML saves time.Moreover, Table 7 shows the tuned hyperparameters of each method in terms of the computational process of the developed model.The compared performance of RF, DT, Gradient Boosting, (KNN), and (NuSVR) was stated through cross plots and error distribution curves.Figure 43a displays the

Development of ANN models
An artificial neural network is a brain-like system that performs computational models on numeric inputs which are considered interconnected nodes (neurons) and generates a single complex functional output relevant to the inputs 2,31 .ANN is considered an artificial intelligence technique used in numerous science and technology  www.nature.com/scientificreports/fields that comprises learning, storing, and information recalling depending on the training dataset 11 .Intelligent algorithms such as (ANN), (PN), (GA), (GRN), and (GMDH) 2 ease the complexity of computational calculations by altering complex formulations to simple ones 3 .One type of neural network is the (MLF), in which neurons of hidden layers are determined, while the (MFFNN) is the most used neural network for function approximation systems 9 .A three-layered MFFNN encompasses input, hidden, and one output layer displayed in Fig. 46 used to approximate complex functions 37 .The characteristics of the ANN model for modeling the viscosities of CH 4 , N 2 ,

Development of ANN model for calculating methane viscosity
A logistic sigmoid with eight neurons in the hidden layer is the optimum transfer function.The selected optimum model predicts the methane viscosity with high precision.This model achieves R 2 = 0.99918, a standard deviation of 1.47, RMSE of 0.00038, MPE of − 0.01%, and MAPE of 1.10% as presented in Table 9.The complete description of the model as described in Table 8 indicates that the proposed model consists of 3 layers.The 1st layer has two neurons for the inputs (pressure and temperature), the hidden layer has 8-neurons, and the third layer has one neuron for the output (methane viscosity).The proposed ANN model for methane viscosity calculation can be illustrated as follows: For i = 1 to none of the neurons and for j = 1 to none of the inputs, the hidden layer inputs are computed using the following formula: where represents the normalized pressure and temperature respectively which can be formulated as: The viscosity of methane can be formulated as follows: The model weights and bias coefficients between the layers are explained in Table 10.www.nature.com/scientificreports/ of four layers.The first layer has two neurons for the inputs (pressure and temperature), two hidden layers have ten neurons for each, and the third layer has one neuron for the output (nitrogen viscosity).The proposed ANN model for calculating the nitrogen viscosity can be illustrated as develops.The inputs for the first hidden layer are calculated, where the normalized pressure and temperature can be expressed as: (13)  P n = 0.000081632P − 1.001184

Development of ANN model for calculating gas mixture viscosity
The optimum model includes one hidden layer with ten neurons besides the input and output layers, and the transfer function is Logistic Sigmoid.The selected optimum model predicts the gas mixture viscosity with high accuracy.This model achieves R 2 = 0.9998, a standard deviation of 1.75, RMSE of 0.00029, MPE of − 0.03%, and MAPE of 1.26.The complete description of the model as presented in Table 8 includes that the proposed model consists of three layers.The first layer has three neurons for the inputs (pressure, temperature, and gas density), one hidden layer has ten neurons, and the third layer has one neuron for the output (gas mixture viscosity).The ( 14) T n = 0.0008997T − 0.691408 www.nature.com/scientificreports/proposed ANN model for calculating the gas mixture viscosity can be designated as follows.The inputs for the first hidden layer are calculated using Eq. ( 9).The normalized pressure, temperature, and gas density can be expressed as: The viscosity of the gas mixture can be calculated by the following function: The weights and biases between the layers are explained in Tables 15 and 16.Figures 47, 48, and 49 show the outputs regression plots for training, validation, testing, and all datasets for the methane, nitrogen, and gas mixture viscosity model respectively.When the ANN outcomes match the targets, the data points should be close to the line of the unit slope for an optimal fit.In this work, the estimated values versus the experimental one fit well closely around the 45° lines for all data sets, including training, validation, and testing data which implies the strength of the proposed model 1 .The coefficient of determination values is more than 0.999, which demonstrates an outstanding arrangement between the predicted and measured values for the three proposed models 37 .
Furthermore, Fig. 50a-c shows the relative errors between the developed model's outputs and the consistent experimental values for the training, validation, and testing sets.Since there is no discernible error trend in the predictions, error distribution curves show that the suggested models are capable of accurately estimating natural gas viscosity over a broad range of P, and T conditions 3 .The accuracy of the developed model is indicated by a denser cloud of data points surrounding the zero-error line.Additionally, at high P, and T conditions, these curves show that the suggested model does not exhibit a significant error trend 3 .Furthermore, Fig. 51 illustrates the cumulative frequency of the ARE related to the developed ANN models.According to the figure, more than 90% of the estimated viscosity values for methane, nitrogen, and gas mixture by the ANN model had an absolute

The potential advantages and limitations of the present study
The present study potentially presents faster and more effective techniques for predicting the viscosities of methane, nitrogen, and natural gas mixtures under normal and harsh conditions, compared to traditional experimental approaches.The developed ML models can certainly be employed for new data, granting the potential to predict viscosities for various mixtures and conditions without the need for further experimental measurements.The established ML models in the present study can also serve as a foundation for further improvement of the predictive models.As more data becomes available, their accuracy and reliability can be enhanced.It is important to add that the proposed ML algorithms can be considered as a substitution of the experimental measurements of PVT properties if the experimental parameters are within the considered ranges.From a limitation point of view, the developed Machine learning models would be better at interpolating within the range of the training data rather than extrapolating beyond it.Therefore, the accuracy of the predictions may decrease when applied to conditions significantly different from those present in the training data.
Table 16.Weights and biases between the second hidden layer and the output layer.

Conclusion
Mathematical models based on machine learning models were used to predict the viscosities of methane, nitrogen, and gas mixtures at high temperatures and high-pressure conditions.Moreover, for effective gas viscosity prediction, several optimization tools including statistical and ML approaches were used.Afterward, several verification principles were implemented to evaluate the developed models.The ML models suggested in this research hold significant importance for researchers and industry specialists involved in natural gas simulation, processing, and recovery.In general, the following conclusion can be drawn: • The proposed ML models in this work effectively addressed the predicting gas viscosity under high-temper- ature and high-pressure circumstances.• A database consisting of 4304 sets of experimental data was collected from the literature before the develop- ment of the ML models.• Three novel correlations have also been developed for CH 4 , N 2 , and natural gas viscosities using ANN.The measured and ANN-estimated viscosity values showed excellent agreement with R 2 = 0.99 for testing data sets.• The performances of the ANN, RF, and Gradient boosting were outstanding prediction modeling during the training and testing phases of CH 4 , N 2 gas, and gas mixture viscosities.• The ANN modeling resulted in R 2 values of 0.99 and 0.99 for the training and testing of the CH 4 , N 2 gas, and gas mixture models, respectively.• Linear regression, NuSVR, and KNN were the least performance ML models for predicting CH 4 , N 2 gas, and gas mixture viscosity models, whereas Linear regression and NuSVR models were unable to predict N 2 gas viscosity.• The inclusive efficacy of the ML models, ordered from high ranking to lowest ranking, was ANN > Gradient Boosting > RF > DT > KNN > NuSVR > Linear Regression, suggesting that ANN, Gradient Boosting, RF, and DT perform better than KNN, NuSVR, and Linear Regression.• The study outcomes indicate that data-based ML could be a beneficial model for precisely predicting the viscosity of CH 4 , N 2 gas, and gas mixture under normal and hard conditions, gaining enhancement in the modeling of natural gas operations.

Data availability
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Figure 5 .
Figure 5. Contour map of viscosity correlated with T and P: (a) methane viscosity, (b) nitrogen viscosity, and (c) gas mixture viscosity.

Figure 7 .
Figure 7. Cross plot for actual versus predicted values of methane viscosity (linear regression model).

Figure 8 .
Figure 8. Error distribution for linear regression model (methane case).

Figure 9 .
Figure 9. Cross plots of actual versus predicted values for nitrogen viscosity (linear regression model).

Figure 10 .
Figure 10.Error distribution with Q-Q plot for linear regression model (nitrogen case).

Figure 11 .
Figure 11.Cross plots of actual versus predicted values for natural gas mixtures viscosity (linear regression model).

Figure 12 .
Figure 12.Error distribution with Q-Q plot for linear regression model (natural gas mixtures case).

Figure 13 .
Figure 13.Cross plots of actual versus predicted values for methane viscosity (decision tree model).

Figure 14 .
Figure 14.Error distribution for decision tree model (methane case).

Figure 15 .
Figure 15.Cross plots of actual versus predicted values for Nitrogen viscosity (decision tree model).

Figure 16 .
Figure 16.Error distribution for decision tree model (nitrogen case).

Figure 17 .
Figure 17.Cross plots of actual versus predicted values for natural gas mixtures viscosity (decision tree model).

Figure 18 .
Figure 18.Error distribution for decision tree model (natural gas mixtures case).

Figure 21 .
Figure 21.Cross plots of actual versus predicted values for Nitrogen viscosity (random forest model).

Figure 22 .
Figure 22.Error distribution for random forest model (nitrogen case).

Figure 23 .
Figure 23.Cross plots of actual versus predicted values for natural gas mixtures viscosity (random forest model).

Figure 24 .
Figure 24.Error distribution for random forest model (natural gas mixtures case).

Figure 25 .
Figure 25.Cross plots of actual versus predicted values for methane viscosity (gradient boosting model).

Figure 33 .
Figure 33.Cross plots of actual versus predicted values for nitrogen viscosity (NuSVR model).

Figure 35 .
Figure 35.Cross plots of actual versus predicted values for natural gas mixtures viscosity (NuSVR Model).

Figure 37 .
Figure 37. Cross plots of actual versus predicted values for methane viscosity (KNN model).

Figure 39 .
Figure 39.Cross plots of actual versus predicted values for Nitrogen viscosity (KNN model).

Figure 41 .
Figure 41.Cross plots of actual versus predicted values for natural gas mixtures viscosity (KNN model).

Figure 42 .
Figure 42.Error distribution for KNN model (natural gas mixtures case).

Figure 43 .
Figure 43.Prediction performance of the ML techniques (non-ANN) for viscosity prediction of methane: (a) cross plot, (b) residual error distribution, and (c) cumulative error frequency.

Figure 44 .
Figure 44.Prediction performance of the ML techniques (non-ANN) for viscosity prediction of Nitrogen: (a) cross plot, (b) residual error distribution, and (c) cumulative error frequency.

Figure 45 .
Figure 45.Prediction performance of the ML techniques (non-ANN) for viscosity prediction of Gas mixture: (a) cross plot, (b) residual error distribution, and (c) cumulative error frequency.

Figure 46 .
Figure 46.Three-layered ANN model consists of (input, hidden, and output layers) for prediction viscosity of (a) methane, (b) nitrogen, (c) gas mixture.

Figure 50 .
Figure 50.Relative error distribution for (a) methane viscosity predicted by ANN model, (b) nitrogen viscosity predicted by ANN model, (c) gas mixture viscosity predicted by ANN model.

Figure 51 .
Figure 51.Cumulative frequency versus ARE% for the developed ANN models.
4raphical error evaluationThe model performance is visualized through graphical error analyses, where crossplots and error distribution are reported3.Three visualization techniques, including cross plots, cumulative frequency plots versus AARE%, and error distribution curves, were implemented to screen the model accuracy4.

Table 1 .
Ranges of correlating parameters for methane, nitrogen, and natural gas viscosities.

Table 3 .
Analysis of variance for methane viscosity.

Table 4 .
Analysis of variance for nitrogen viscosity.

Table 5 .
Analysis of variance for gas mixture viscosity.

Table 6 .
Summary of run time and memory usage for machine learning models used in this study.

Table 7 .
Optimized hyperparameters for machine learning models used in this study.

Table 11 .
The complete description of the model as presented in Table8includes that the proposed model consists

Table 8 .
The ANN model's characteristics of CH 4 , N 2 , and natural gas mixture viscosities.

Table 9 .
Statistics for selecting the optimum ANN model of methane viscosity.

Table 10 .
Model coefficients between the layers for methane viscosity modeling.

Table 11 .
Statistics for selecting the optimum ANN model of nitrogen viscosity.

Table 12 .
Model coefficients between the layers between the input and the first hidden layer.

Table 13 .
Model coefficients between the first and the second hidden layers.

Table 14 .
Model coefficients between the second hidden layer and the output layer.

Table 15 .
Weights and biases between the input layer and the hidden layer.error of less than 2%.Hence, these results demonstrate that ANN models had acceptable performance and could successfully estimate the gas viscosity for CH 4 , N 2 , and natural gas mixture.